Skip to content

Conversation

manastasova
Copy link
Contributor

@manastasova manastasova commented Aug 15, 2025

Description of changes:

Add x86 optimized Keccak implementation from s2n-bignum

This PR integrates optimized and formally verified x86 Keccak code from the s2n-bignum library. The code has been imported as-is using our importer script, adding all changes up to commit 717b57ac643327489d2ac7dd4022a64b5dfb2d8f.

Key aspects:

  • Adds hardware-accelerated Keccak implementation for x86 platforms
  • No changes made to the imported code
  • Enables SHA3/SHAKE to use Keccak assembly on x86 (with UNIX) platforms

Testing:

ninja && ./crypto/crypto_test
./tool/bssl speed -filter {SHA3-224, ...}

SHA3 Performance Comparison

ASM Implementation vs C Implementation

Algorithm Input Size ASM Implementation C Implementation Speedup
ops/sec MB/s ops/sec MB/s %
SHA3-224 16 bytes 3,367,963.0 53.9 3,084,636.0 49.4 9.2%
256 bytes 1,748,043.8 447.5 1,629,982.1 417.3 7.2%
1350 bytes 363,247.7 490.4 343,366.1 463.5 5.8%
8192 bytes 63,937.3 523.8 60,643.4 496.8 5.4%
16384 bytes 31,957.9 523.6 30,315.5 496.7 5.4%
SHA3-256 16 bytes 3,400,705.8 54.4 3,106,263.8 49.7 9.5%
256 bytes 1,762,683.0 451.2 1,634,490.0 418.4 7.8%
1350 bytes 365,010.1 492.8 342,906.0 462.9 6.4%
8192 bytes 60,317.3 494.1 56,481.2 462.7 6.8%
16384 bytes 30,437.0 498.7 28,552.4 467.8 6.6%
SHA3-384 16 bytes 3,416,610.5 54.7 3,103,572.4 49.7 10.1%
256 bytes 1,206,249.7 308.8 1,124,389.5 287.8 7.3%
1350 bytes 284,628.8 384.2 261,900.7 353.6 8.7%
8192 bytes 47,109.4 385.9 43,821.1 359.0 7.5%
16384 bytes 23,612.4 386.9 21,844.8 357.9 8.1%
SHA3-512 16 bytes 3,435,003.8 55.0 3,132,639.7 50.1 9.7%
256 bytes 921,408.5 235.9 855,101.3 218.9 7.8%
1350 bytes 197,272.7 266.3 181,839.8 245.5 8.5%
8192 bytes 32,652.0 267.5 30,504.6 249.9 7.0%
16384 bytes 16,427.6 269.2 15,310.5 250.8 7.3%

SHA3 Performance: Details

ASM Implementation

./tool/bssl speed -filter SHA3-224
Did 3368000 SHA3-224 (16 bytes) operations in 1000011us (3367963.0 ops/sec): 53.9 MB/s
Did 1749000 SHA3-224 (256 bytes) operations in 1000547us (1748043.8 ops/sec): 447.5 MB/s
Did 364000 SHA3-224 (1350 bytes) operations in 1002071us (363247.7 ops/sec): 490.4 MB/s
Did 64000 SHA3-224 (8192 bytes) operations in 1000980us (63937.3 ops/sec): 523.8 MB/s
Did 32000 SHA3-224 (16384 bytes) operations in 1001318us (31957.9 ops/sec): 523.6 MB/s
./tool/bssl speed -filter SHA3-256
Did 3400750 SHA3-256 (16 bytes) operations in 1000013us (3400705.8 ops/sec): 54.4 MB/s
Did 1762750 SHA3-256 (256 bytes) operations in 1000038us (1762683.0 ops/sec): 451.2 MB/s
Did 366000 SHA3-256 (1350 bytes) operations in 1002712us (365010.1 ops/sec): 492.8 MB/s
Did 61000 SHA3-256 (8192 bytes) operations in 1011319us (60317.3 ops/sec): 494.1 MB/s
Did 31000 SHA3-256 (16384 bytes) operations in 1018496us (30437.0 ops/sec): 498.7 MB/s
./tool/bssl speed -filter SHA3-384
Did 3417000 SHA3-384 (16 bytes) operations in 1000114us (3416610.5 ops/sec): 54.7 MB/s
Did 1207000 SHA3-384 (256 bytes) operations in 1000622us (1206249.7 ops/sec): 308.8 MB/s
Did 285000 SHA3-384 (1350 bytes) operations in 1001304us (284628.8 ops/sec): 384.2 MB/s
Did 48000 SHA3-384 (8192 bytes) operations in 1018905us (47109.4 ops/sec): 385.9 MB/s
Did 24000 SHA3-384 (16384 bytes) operations in 1016415us (23612.4 ops/sec): 386.9 MB/s
./tool/bssl speed -filter SHA3-512
Did 3436000 SHA3-512 (16 bytes) operations in 1000290us (3435003.8 ops/sec): 55.0 MB/s
Did 922000 SHA3-512 (256 bytes) operations in 1000642us (921408.5 ops/sec): 235.9 MB/s
Did 198000 SHA3-512 (1350 bytes) operations in 1003687us (197272.7 ops/sec): 266.3 MB/s
Did 33000 SHA3-512 (8192 bytes) operations in 1010657us (32652.0 ops/sec): 267.5 MB/s
Did 17000 SHA3-512 (16384 bytes) operations in 1034842us (16427.6 ops/sec): 269.2 MB/s

C Implementation

./tool/bssl speed -filter SHA3-224
Did 3085000 SHA3-224 (16 bytes) operations in 1000118us (3084636.0 ops/sec): 49.4 MB/s
Did 1630000 SHA3-224 (256 bytes) operations in 1000011us (1629982.1 ops/sec): 417.3 MB/s
Did 344000 SHA3-224 (1350 bytes) operations in 1001846us (343366.1 ops/sec): 463.5 MB/s
Did 61000 SHA3-224 (8192 bytes) operations in 1005881us (60643.4 ops/sec): 496.8 MB/s
Did 31000 SHA3-224 (16384 bytes) operations in 1022580us (30315.5 ops/sec): 496.7 MB/s
./tool/bssl speed -filter SHA3-256
Did 3107000 SHA3-256 (16 bytes) operations in 1000237us (3106263.8 ops/sec): 49.7 MB/s
Did 1635000 SHA3-256 (256 bytes) operations in 1000312us (1634490.0 ops/sec): 418.4 MB/s
Did 343000 SHA3-256 (1350 bytes) operations in 1000274us (342906.0 ops/sec): 462.9 MB/s
Did 57000 SHA3-256 (8192 bytes) operations in 1009186us (56481.2 ops/sec): 462.7 MB/s
Did 29000 SHA3-256 (16384 bytes) operations in 1015676us (28552.4 ops/sec): 467.8 MB/s
./tool/bssl speed -filter SHA3-384
Did 3105000 SHA3-384 (16 bytes) operations in 1000460us (3103572.4 ops/sec): 49.7 MB/s
Did 1125000 SHA3-384 (256 bytes) operations in 1000543us (1124389.5 ops/sec): 287.8 MB/s
Did 262000 SHA3-384 (1350 bytes) operations in 1000379us (261900.7 ops/sec): 353.6 MB/s
Did 44000 SHA3-384 (8192 bytes) operations in 1004083us (43821.1 ops/sec): 359.0 MB/s
Did 22000 SHA3-384 (16384 bytes) operations in 1007105us (21844.8 ops/sec): 357.9 MB/s
./tool/bssl speed -filter SHA3-512
Did 3133000 SHA3-512 (16 bytes) operations in 1000115us (3132639.7 ops/sec): 50.1 MB/s
Did 856000 SHA3-512 (256 bytes) operations in 1001051us (855101.3 ops/sec): 218.9 MB/s
Did 182000 SHA3-512 (1350 bytes) operations in 1000881us (181839.8 ops/sec): 245.5 MB/s
Did 31000 SHA3-512 (8192 bytes) operations in 1016241us (30504.6 ops/sec): 249.9 MB/s
Did 16000 SHA3-512 (16384 bytes) operations in 1045032us (15310.5 ops/sec): 250.8 MB/s

x86 ASM ML-KEM with C vs. ASM SHA3/SHAKE Performance

ASM SHA3/SHAKE vs C SHA3/SHAKE

Algorithm Operation ASM SHA3 Implementation C SHA3 Implementation Speedup
ops/sec ops/sec %
ML-KEM-512 keygen 99,584.3 93,781.9 6.2%
encaps 89,685.4 85,265.5 5.2%
decaps 75,576.0 70,990.9 6.5%
ML-KEM-768 keygen 63,174.4 59,462.6 6.2%
encaps 58,906.5 55,701.2 5.8%
decaps 49,780.5 46,396.9 7.3%
ML-KEM-1024 keygen 42,521.6 39,401.6 7.9%
encaps 39,878.3 37,049.2 7.6%
decaps 33,794.9 31,636.3 6.8%

x86 ASM ML-KEM with ASM SHA3 Implementation

./tool/bssl speed -filter ML-KEM-512
Did 100000 ML-KEM-512 keygen operations in 1004174us (99584.3 ops/sec)
Did 90000 ML-KEM-512 encaps operations in 1003508us (89685.4 ops/sec)
Did 76000 ML-KEM-512 decaps operations in 1005610us (75576.0 ops/sec)

./tool/bssl speed -filter ML-KEM-768
Did 64000 ML-KEM-768 keygen operations in 1013068us (63174.4 ops/sec)
Did 59000 ML-KEM-768 encaps operations in 1001588us (58906.5 ops/sec)
Did 50000 ML-KEM-768 decaps operations in 1004410us (49780.5 ops/sec)
./tool/bssl speed -filter ML-KEM-1024
Did 43000 ML-KEM-1024 keygen operations in 1011250us (42521.6 ops/sec)
Did 40000 ML-KEM-1024 encaps operations in 1003052us (39878.3 ops/sec)
Did 34000 ML-KEM-1024 decaps operations in 1006070us (33794.9 ops/sec)

x86 ASM ML-KEM with C SHA3 Implementation

./tool/bssl speed -filter ML-KEM-512
Did 94000 ML-KEM-512 keygen operations in 1002326us (93781.9 ops/sec)
Did 86000 ML-KEM-512 encaps operations in 1008614us (85265.5 ops/sec)
Did 71000 ML-KEM-512 decaps operations in 1000128us (70990.9 ops/sec)
./tool/bssl speed -filter ML-KEM-768
Did 60000 ML-KEM-768 keygen operations in 1009038us (59462.6 ops/sec)
Did 56000 ML-KEM-768 encaps operations in 1005365us (55701.2 ops/sec)
Did 47000 ML-KEM-768 decaps operations in 1012998us (46396.9 ops/sec)
./tool/bssl speed -filter ML-KEM-1024
Did 40000 ML-KEM-1024 keygen operations in 1015188us (39401.6 ops/sec)
Did 38000 ML-KEM-1024 encaps operations in 1025663us (37049.2 ops/sec)
Did 32000 ML-KEM-1024 decaps operations in 1011496us (31636.3 ops/sec)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and the ISC license.

@codecov-commenter
Copy link

codecov-commenter commented Aug 16, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 78.73%. Comparing base (2980c1b) to head (39e89e5).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2619      +/-   ##
==========================================
- Coverage   78.74%   78.73%   -0.02%     
==========================================
  Files         646      646              
  Lines      111300   111304       +4     
  Branches    15714    15715       +1     
==========================================
- Hits        87647    87634      -13     
- Misses      22961    22976      +15     
- Partials      692      694       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

andrewhop pushed a commit that referenced this pull request Aug 25, 2025
Bit Interleave is used for performance optimizations on 32-bit
platforms. Bit Interleave adds unnecessary complexity.

### Issues:
Some Windows compiler, e.g., old versions of Microsoft Visual C++
(MSVC), do not support some preprocessor directives and expressions,
e.g., of the type:

```
// Double-check that bit-interleaving is not used on AArch64
#if BIT_INTERLEAVE != 0
#error Bit-interleaving of Keccak1600 states should be disabled for AArch64
#endif
```

in
https://github.com/aws/aws-lc/blob/d781046a99638d1466ec912cf0191d0564de2084/crypto/fipsmodule/sha/keccak1600.c#L422

A solution could be:

```
#if defined(BIT_INTERLEAVE) && BIT_INTERLEAVE
  #error Bit-interleaving of Keccak1600 states should be disabled for AArch64
#endif
```

However, BIT_INTERLEAVE is intended for only optimizing 32-bit
platforms, i.e., it adds unnecessary complexity to the code without
providing many benefits.

Therefore, removing BIT_INTERLEAVE support is the better solution for
clarity and maintainability.


### Description of changes: 
Remove all support for BIT_INTERLEAVE.

### Call-outs:
This change is needed/motivated by the integration of x86 Keccak to
aws-lc #2619 which fails when running
on x86 Windows platform.

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license and the ISC license.
@manastasova manastasova marked this pull request as ready for review August 27, 2025 15:50
@manastasova manastasova requested a review from a team as a code owner August 27, 2025 15:50
@dkostic dkostic requested review from dkostic and nebeid August 28, 2025 16:46
dkostic
dkostic previously approved these changes Aug 28, 2025
@@ -349,7 +351,7 @@ void KeccakF1600(uint64_t A[KECCAK1600_ROWS][KECCAK1600_ROWS]) {
// Neoverse V1 and V2 do support SHA3 instructions, but they are only
// implemented on 1/4 of Neon units, and are thus slower than a scalar
// implementation.

#if defined(OPENSSL_AARCH64)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Maybe you can && this line and the one below to match l.429.
When you add the brackets as per Dusan's comment, you can maybe do this small change, if you want.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, these are not ending at the same place. There is one additional non-s2n-bignum assembly implementation for arm, so it should fall back to it if not s2n-bignum but still aarch64 is detected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants